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ZERO BIASING AND A DISCRETE CENTRAL LIMIT THEOREM 

By Larry Goldstein and Aihua Xia 

University of Southern California and University of Melbourne 

We introduce a new family of distributions to approximate ¥(W £ 
A) for A C {. . . , — 2, — 1, 0, 1, 2, . . .} and W a sum of independent 
integer- valued random variables £1 , £2 , • ■ ■ , £n with finite second mo- 
ments, where, with large probability, W is not concentrated on a 
lattice of span greater than 1. The well-known Berry-Esseen theo- 
rem states that, for Z a normal random variable with mean E(W^) 
and variance Var(W), ¥(Z £ A) provides a good approximation to 
¥(W G A) for A of the form (—00, x]. However, for more general A, 
such as the set of all even numbers, the normal approximation be- 
comes unsatisfactory and it is desirable to have an appropriate dis- 
crete, nonnormal distribution which approximates W in total varia- 
tion, and a discrete version of the Berry-Esseen theorem to bound 
the error. In this paper, using the concept of zero biasing for discrete 
random variables (cf. Goldstein and Reinert [J. Theoret. Probab. 18 
(2005) 237-260]), we introduce a new family of discrete distributions 
and provide a discrete version of the Berry-Esseen theorem showing 
how members of the family approximate the distribution of a sum W 
of integer-valued variables in total variation. 

1. Introduction. We introduce a new family of distributions to approxi- 
mate F(W £ A) for A a subset of Z = {..., -2, -1, 0, 1,2,.. .} and W a sum 
of independent integer- valued random variables £1 , £2 j • ■ • > £n with finite sec- 
ond moments, where the probability that W is not concentrated on a lattice 
of span greater than 1 is large. When A is of the form (— 00, x] and £j's have 
finite third moments, we can use the well-known Berry-Esseen theorem ([7] 
and [15]) which states that there exists an absolute constant C such that 
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where [i = E(W), a 2 = Yar(W) and $ is the cumulative distribution function 
of the standard normal. If £j's are identically distributed, then the bound 
is of the order ?i -1 / 2 , which is known to be the best possible. However, for 
more general A, such as the set of all even numbers, the errors of normal 
approximation may be large, or difficult to compute; for such cases, it is de- 
sirable to have a distribution which approximates W in total variation, and 
a discrete version of the Berry-Esseen theorem to evaluate the error. More- 
over, approximations in total variation have the property that any function 
of W is approximated in total variation to the same degree as W itself, an 
advantage not enjoyed by the Kolmogorov distance. 

A few discrete distributions, such as signed compound Poisson measures 
and translated Poisson distributions (see [6, 9] and references therein) have 
been proposed to make very close approximations in total variation to the 
distribution of W. These approximations can be viewed as modifications of 
Poisson approximation and in applications, one often transforms the sum W 
into a form which can be approximated reasonably well by a suitably chosen 
Poisson random variable. In estimating the errors of approximation, besides 
the assumption that W has large probability of not being concentrated on 
a lattice of span greater than 1, one also needs other assumptions, such 
as existence of the third moments of the &'s ([6], Theorem 4.3), and may 
additionally introduce truncation. Another approach is to define a discrete 
normal Y by 

P(y = j)=P(i-l/2<Z<j + l/2), Z~N{ii,o\ jez 

(L. H. Y. Chen, personal communication), though it is not clear what quality 
of approximation Y can achieve. 

In this paper we propose a class of approximating distributions which have 
carrier space Z, thus avoiding truncation and integerization problems. These 
new distributions are uniquely determined by parameters [i and a 2 , similarly 
to how the approximating normal distribution is determined in the classical 
central limit theorem. It is expected that any such approximating family of 
discrete distributions be related to the Poisson, a distribution characterized 
by the property of being equal to its own reduced Palm distribution; see 
[23], page 93. As this property is intrinsic in the study of certain Poisson 
approximations [1, 11], and since the Palm distribution involves only the 
first moment of the distribution, it is of interest to determine whether there 
exists any counterpart to the Poisson also involving the second moment, 
which gives additional flexibility in approximation. One appropriate coun- 
terpart can be uncovered through the concept of zero biasing [20] . Based on 
the continuous normal case, it is expected that the class of approximating 
distributions should arrive naturally as the unique candidates which equal 
their zero-biased distribution. However, because of the discrete setting, some 
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adjustments are first needed to make the idea work. In Section 2, we pro- 
vide some background on zero biasing in both the continuous and discrete 
settings, and define our approximating family of distributions through a 
modified zero biasing form. In particular, our distributions are related to 
the operator (2.11), connected to discrete zero biasing, and are the station- 
ary laws of the processes with corresponding generator (3.3), similarly to 
how normal laws are related to an operator connected to continuous zero 
biasing, and are the stationary distributions of the Ornstein-Uhlenbeck pro- 
cesses. Next, in Section 3 we establish the Stein equation and employ the 
bilateral birth and death processes ([28], Chapter 8) to estimate the Stein 
factors, in a similar fashion to that in [8]. In Section 4 a general approxi- 
mation theorem is given which provides a bound in total variation between 
an integer-valued random variable Y and a member of the family of our 
approximating distributions, in terms of the distance between Y and its 
zero-biased distribution, paralleling Lemma 2.1 in [17] for the continuous 
case. The bound is of the same order as the normal approximation when the 
weaker Komogorov metric is used. The general theorem is then applied to 
obtain a bound for the approximation of a sum W of independent integer- 
valued random variables under (only) second moment conditions, yielding a 
form which simplifies further under the assumption of finite third moments. 

2. Zero biasing and characterization of the approximating distribution. 

For any nonnegative random variable X with mean K(X) = fi 6 (0, oo) and 
distribution dF(x), the X-size biased distribution is given by 



A* 

or, equivalently, by the characterizing equation 

E[Xf(X)] = /uE/(X s ) for all / with E\Xf(X)\ < oo. 

It is often helpful to think of size biasing as a transformation defined on non- 
negative distributions with finite mean. Size biasing can appear (unwanted 
and sometimes unnoticed) in various sampling contexts [13]; for example, in 
random digit dialing, where F in (2.1) is the uniform distribution on tele- 
phone numbers, it is twice as likely to dial a household with x = 2 telephone 
lines than a household where x = 1. When X is a nonnegative integer- valued 
random variable with positive finite mean fj,, the X-size biased distribution 
(2.1) specializes to 



The counterpart of size biasing in point process theory is the Palm distribu- 
tions (see [23], Chapter 10) introduced by Palm in 1943. It is easily verified 
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F(X S = k) 



kF(X = k) 



fc = 0,l,.... 
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that X has a Poisson distribution if and only if C{X S ) = C(X + 1). This fact 
can be used to study Poisson approximation and is part of the foundation 
of the well-known Stein-Chen method (see [10] or [4]). 

One notable property of the size-biased transformation is that a sum of 
independent nonnegative random variables can be size-biased by replacing 
a single summand, chosen with probability proportional to its mean, with 
one independent of the remaining variables and having that summand's size- 
biased distribution; that is, with £j independent nonnegative variables with 
finite mean E£j = fii, i = 1, . . . , n, and 

n 

W = 2&> we have W s = W - £j + ff , 
i=i 

where / is a random index, independent of £i, . . . ,f n , with distribution 



P(J = t) 



For £ a nontrivial indicator variable, (2.2) shows that £ s = 1. Hence, a sum of 
independent indicators £j, i = 1, . . . , n, can be size-biased by setting a single 
indicator, chosen with probability proportional to E£j, to one. 

The zero bias transformation was introduced in [20], based both on its 
similarity to the size-biased transformation and the following characteriza- 
tion of the mean zero normal distribution given in [26], which forms the 
basis of Stein's celebrated method for normal approximation [27]: Z is a 
mean zero, variance a 2 normal variable if and only if, for all absolutely 
continuous / with E\Zf(Z)\ < oo, 

E[Zf(Z)] = a 2 Ef(Z). 

For any Y with EY = and Var(Y) = a 2 , Goldstein and Reinert [20] prove 
that there exists Y*, called the Y-zero biased distribution, such that, for all 
absolutely continuous / with E|Y/(Y)| < oo, 

(2.3) E[Yf(Y)]=a 2 Ef'(Y*). 

By the Stein (if and only if) characterization, it is clear that Y has the 
mean zero normal distribution if and only if C(Y) = C(Y*). In other words, 
the mean zero normal distribution is the unique fixed point of the zero 
bias transformation. Heuristically, then, one can show that Y is close to 
normal by showing that Y is close to Y*; for in this case, Y itself is close to 
being a fixed point and, therefore, should be close to the unique fixed point, 
the normal. For this reason, it is key that zero biasing enjoys a property 
similar to the one mentioned above which holds for size biasing. A sum Y 
of independent mean zero variables £i , . . . , £ n with finite variances a\ , . . . , a 2 
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can be zero-biased by choosing a variable using an independent index I with 
distribution 

which takes values with probability proportional to variance, and replacing 
the selected variable with one from that summand's zero-biased distribution 
which is independent of the remaining variables, that is, 

(2.4) Y* = Y-h + $. 

Hence, a sum of roughly comparable independent mean zero variables with 
finite variances is close in distribution to normal, since its zero bias distribu- 
tion differs from its original one by only one comparable summand of many. 
For applications of the zero bias transformation to simple random sampling 
(see [20]) to hierarchical structures (see [17]) to combinatorial central limit 
theorems [18] and to the computation of L l bounds to the normal [20] . 

Goldstein and Reinert [21] show that both size biasing and zero biasing are 
special cases of distributional transformations specified by a biasing function 
P and an order m; both size biasing and zero biasing have P{x) = x, and 
orders m = and m = 1, respectively; such transformations are often related 
to families of orthogonal polynomials. To approximate by a given distribu- 
tion, one can often construct a transformation for which it is a fixed point. 
The transformations for which discrete distributions will be fixed points have 
derivative replaced by difference, in particular, with A/(i) := f(i + 1) — f(i), 
the Poisson distribution with mean A is a fixed point of the transformation 
characterized by 

E[(Y-A)/(Y)] = AEA/(Y*). 

However, one obtains additional flexibility by not insisting that the mean 
and variance be equal. Therefore, parallel to (2.3), we give the following 
definition: 

Definition 2.1. For an integer- valued random variable Y with mean 
ji and finite variance a 2 , we say that Y* has the discrete Y-zero biased 
distribution if, for all bounded functions / : Z — ► R, 

(2.5) E[(Y-fi)f(Y)} = a 2 MAf(Y*). 

It is easily verified that (2.4) holds for the discrete zero bias transfor- 
mation, that is, that a sum of independent discrete random variables can 
be discrete zero-biased by replacing one variable, chosen with probability 
proportional to variance, by a variable from that summand's discrete zero 
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bias distribution, independent of the remaining variables. When no confu- 
sion between the discrete and continuous cases can arise, we simply say that 
Y* has the y-zero biased distribution. 

For Y an integer-valued random variable with finite mean and variance, 
the existence and uniqueness of C(Y*) can be proved as follows. For each 
j E Z, let fj(i) = ly j0o) (i), so that 



giving 



(2.6) 



F(Y*=j-l) 



for i= j — 1 , 
for 



a 2 

E[Yl [ji00) (Y)]-^(Y>j) 



(7- 



For j > fi, (2.6) is clearly nonnegative, and the identity E[(Y — fj,)lry>j)] = 
— E[(y — fj,)lry<j)] implies (2.6) is also nonnegative for j < fx. Using this 



identity and the fact that J2j^=i l(Y>j) 
-Yl 



Y1 (Y>1) and E?= 



(y<o)j we have the probabilities in (2.6) summing to one, since 

oo 

£(* r -*i)l(y>j)+ E ( y "^V>i) 



l (Y<j) 



E 



E 



j=l j=-oo 



= E[(y - ^i ( Y>i) + (y - m)yi ( y<o)] 
= E[(y-/i)y] = cj 2 . 

For 77 an indicator var iable with Var (77) = <9 2 = (1 -Er/)Er/ > 0, (2.6) shows 
that 77* = 0: 



"(77* = 0) 



(2.7) 



E[(r? - Er/)!^)] _ E[(t? - E ?? )l (7?=1) ] 

2 

(1 -Er ? )P(r ? = 1) 



9 2 



9 2 



1. 



Though true in this particular case, it is incorrect to conclude from this 
example that rf = r] s — 1, that is, that the discrete zero bias operation is the 
same as the reduced Palm. For an independent sum, the Palm distribution 
is obtained by replacing a summand chosen proportional to its mean, but 
to achieve the zero bias distribution, one chooses proportional to variance. 
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Since the fixed points of the continuous zero bias transformation (2.3) 
are the mean zero normal distributions, it is of immediate interest to deter- 
mine which distributions, if any, are fixed points of the discrete zero bias 
transformation (2.5), that is, to find which S satisfy 

(2.8) E[(S-Li)f(S)]=a 2 EAf(S) 

for all bounded functions / on Z. We show now that, unlike the situation in 
the continuous case, distributional fixed points do not exist for all choices of 
li and a 2 . It is for this reason that we introduce the family of distributions 
given in Lemma 2.2. Using fj(i) = l(i=j) f° r j € Z in (2.8), we have 

(2.9) (a 2 +j-Li)¥(S = j)=a 2 ¥(S = j-l), j G Z. 

There are two cases to check for (2.9), depending on whether or not fx — a 2 
is an integer. Let k := min{i : i> fx — a 2 }. When fx — a 2 is an integer, (2.9) 
gives that P(S = j) = for j < k. However, when fx — a 2 is not an integer, 
then, unless C(S) is the null measure on Z, the values P(S = k — 1) and 
P(S = k — 2) strictly alternate in sign, that is, C(S) is a signed measure 
which takes on both positive and negative values. To avoid such a signed 
measure when fx — a 2 is not an integer, we truncate the distribution at, 
for example, k, so that P(S = j) =0 for j < k, in which case (2.9) fails for 
j = k and S is only approximately a fixed point of the discrete zero bias 
transformation. In either of these cases, where fx — a 2 is an integer or where 
[i — a 2 is not an integer and we truncate at k, iteration of (2.9) yields 



a 2 




ns=j) = [ ^— — p(5=«), j>k+i. 

If /U — a 2 is an integer we now see that S corresponds to a translated Poisson 
([6] , page 131); that is, the distribution of S equals that of Y + /x — a 2 with Y 
a Poisson random variable with mean a 2 . Further, elementary calculations 
using (2.9) yield 

(2.10) ES = nF(S >k+l) + (a 2 + k)F(S = k), 

so ES" = \x if and only if fi — a 2 is an integer; since (2.10) will not be used 
later on, we omit the details. 

Using a truncated approximating distribution, such as S above, leaves 
P(W < k) in the upper bound when we estimate the error caused by approxi- 
mating W by 5, and can become quite inconvenient in applications ([4], Sec- 
tion 9.2 and [2]). To avoid truncation, we introduce a two-parameter family 
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of distributions which have carrier space Z, similar to the two-parameter 
normal distributions which have carrier space R. For an integer re with 
\x — cr 2 <re</x + o" 2 + l, define the operator 

o 2 f{i + l)- {a 2 + i 



(2.11) Bf(i) 



= a 2 Af(i)-(i-(j,)f(i), 
(a 2 +^-i)f(i + l)-a 2 f(i) 

= a 2 Af(i)-(i- fJ ,)f(i + l), 



i > re, 



i < re — 1, 



for all bounded functions / on Z. Note that a 2 + i — /i and a 2 + /x — i are 
nonnegative over their respective ranges i > k and i < re — 1, and strictly 
positive except when fi — a 2 is an integer and i = re = \i — a 2 . The following 
lemmas are devoted to the properties of C(S) := \P K (/i,<7 2 ), the distribution 
characterized by £?: 

Lemma 2.2. There exists a unique distribution C(S) = ^(/i, o~ 2 ), c/iar- 
acterized by MBf(S) = /or all bounded functions f on Z, whose distribution 
■Ki = F(S = i), is Z, satisfies 



7T« 



e n 



£7 



(2.12) 



and 



K-2 /K-2 

e n 



+ 1 + 



(J- 



o~ + re — [i 
a 2 + /x — re + 1 



o" + re — /i 



-OO \l=J 



a 2 + fi — i J a 2 + (i — re + 1 



7T; 



n 



0" 



o" + re — /i 
cr 2 + /i — re + 1 

(K-2 
n 
2=7 



-vr K 



0" 



CX 2 + ii 



a + k — fi 
a 2 + /x — re + 1 



7T« 



J>« + 1, 
i = « - 1, 

j < re - 2. 



Moreover, E(\S\ l ) < oo /or < Z < oo anci EBf(S) = /or aZZ / siic/i i/iaf 
E{|5|[|/(5)| + |/(5 + l)|]} < oo. 



Remark. When fi — a is an integer and re = /x — o~ , the distribution of 
<S reduces to that of S when re = re. 

PROOF of Lemma 2.2. Since, for each fixed j and = l(i=j), 
= E[Bl i (S)] =£Blj(ihi=£liO' - ljxj-i +m i (j)7r i , 
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we obtain recursive formulae as follows: 

(2.13) -(a 2 +j-fj,)'rr j + a 2 n j -x = 0, j>K + l, 

(2.14) -(a 2 +j - ^ttj + {a 2 +fi-j + l)^_i=0, j = k, 

(2.15) -cj 2 ^- + (a 2 + + l)vr j _i =0, j<K-l. 
Hence, 

n i = ( IT ~T~~- V«' J> K + 1 > 

(2.16) 7r K _i = -5- — -ir K 



and 



cr 2 + /i — k + 1 



2 / K— 2 2 



7T,-_l 



— : ) "^k-I; J < k — 1, 

cj 2 + /i-j + i y^^^ + ^-zy 

so, replacing j by j + 1 in the last identity, it follows from (2.16) that 



7T, 



a 2 \f a 2 + K-fi 



Kt=J 



- LJ - a z + fj, — i I \a z + fi — k + I I 



Summing the probabilities to one yields (2.12). Convergence is guaranteed, 
for the sum in (2.12) over j > k + 1, say, by the fact that a 2 j (a 2 + i — fi) < 
o 2 /(i - k) for all i > k + 1 and the fact that E£U+i ^ 2(j_K) /(i - K ) ! < 
oo. Hence, the distribution of S" exists and is uniquely determined by the 
specified distribution. 

The claim E(\S\ l ) < oo follows fr om the fact that 

E \]\ l *j<** E lil / ^r^T<°° 

and 

(J 2(K-j-l) 

E blS-<^-i E 3W <oc - 

i<K-2 j<K-2 ^ ^ ^ 

Finally, taking / n = (/An)V (— n), n = 1, 2, . . . , we have EBf n {S) = and 
|#/n«| < (|*| + H + 2cr 2 )[|/(i)| + [/(i + 1)|]. 

Hence, the dominated convergence theorem ensures that EBf(S) = by 
letting n — > oo. □ 

Lemma 2.3. E(S) = n and Var(S) = a 2 + (a 2 + re - /x)7r K . 
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Proof. Letting f(i) = 1, since Bf(i) = Li - i for all i, EBf(S) = yields 
ES = ii. Next, letting f(i) = i in (2.11), for i > n we have Bf(i) = a 2 — i 2 + fii, 
while for i < k — 1 we have Bf(i) = a 2 — i 2 + Lii + fi — i, which can be written 
as 

Bf(i) = a 2 -i 2 + + (fi- i)l(i<«-i)- 
It follows from Lemma 2.2 that MBf(S) = 0, which yields 
= a 2 -Var(5)+E( / x-5)l (s < K _ 1) 

so that Var(S , )=cr 2 +E(^-S")l (5 < K „ 1) . 
Now, using E(5 — //) = 0, and (2.13) for the fourth equality, it follows that 

E(/x - S^l^-i) = E(5 - A»)l(fi>«) = £ (* - A 4 ) 71 "* 

i>K 

= (°" 2 + * ~~ Z 2 ) 71 "* ~ VTj + (K - ^i)7T K 

i>K+l i>K+l 

= ^2 ^i-l -O 2 n i + ( K - A 4 ) 71 "* 

i>K+l i>/t+l 

= (ct 2 + k - n)n K . 

Hence, 

Var(S') =a 2 + (a 2 + n - ii)tt k . □ 

Note that if we choose k = min{i :i > fi — cr 2 }, then | Var(S') — u 2 \ < 7r K . 
The following lemma shows in what sense S is close to a fixed point of the 
zero bias transformation when Var(S) is close to a 2 : 

Lemma 2.4. The S-zero biased distribution S* , given in Definition 2.1, 
satisfies 

(a 2 F(S = j)/Y a r(S), j>K, 
nS* = j) = { l-a 2 /Var(5), j = k — 1, 

{a 2 F{S = j + l)/\av(S), j<K-2. 

Proof. Fixing j > k and letting fj(i) = 1u+i j0 o)(^)j we have Bfj(i) = 
<r 2 A/j(i) — (i — n)fj(i),\/i € Z. Using the characterization equation 
EBf/(S) = and Definition 2.1, 

= E(a 2 A/ j (5) - (S-MjiS)) = E(<7 2 A/,(S) - Var^A^S*)) 

which, along with Afj(i) = gives the claim for j > k. 
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Likewise, fixing j < k — 2 and letting fj (i) = l(_ oc (i) , we have Bfj (i) 



a 2 Afj(i) -(i- ^)fj(i + l),Vi € Z, and 



= E(a 2 A/ J (S)-(S- /U )/ J (S + l)) 

= E( ( r 2 A/ i (5)-Var(5)A/ i (5'* + l)) ! 

which, with Afj(i) = — l(j =J+1 ), gives the claim for j < k — 2. Finally, the 
value P(5* = /c - 1) can be obtained from E 4 =-oo p (5'* = «) = 1- □ 

3. Stein's method and Stein's factors. Brown and Xia [8] introduced a 
class of approximating distributions tt, determined by parameters oti,/3i,i € 
Z + := {0, 1,2,.. .}, satisfying 

(3.1) iriai = m+iPi+i, iGZ + . 

Equation (3.1) enabled the authors of that work to view tt as the station- 
ary distribution of a birth-death process and to give a neat probabilistic 
derivation of Stein magic factors, essentially under the condition that for 
each k = 1, 2, ... , 

(3.2) a k - a k -i < Pk - Pk-i, 

letting 00 = 0. A key point in that derivation is that the solution to the Stein 
equation is an explicit linear combination of mean upward and downward 
transition times of the birth-death process ([8], Lemma 2.1). Under condi- 
tion (3.2), all differences of the solution of the Stein equation are negative 
except one — an essential structure for the neat derivation of Stein magic 
factors for polynomial birth-death approximations, which include Poisson, 
binomial, negative binomial and hypergeometric approximations [8]. 

In this section we consider approximating distributions tt on Z (instead 
of Z+) which are determined by two parameters \i and a 1 and which satisfy 
the balance equation (3.4). Analogously to the context in [8], we define a 
generator (3.3) of a bilateral birth-death process such that tt is its stationary 
distribution. In Lemma 3.4, we prove that all differences of the solution of the 
Stein equation are negative except one and derive the Stein magic factors. 

For each bounded function g on Z, writing f(x + 1) = g(x + 1) — g(x), we 
have 

o 2 {g(i + l)-g(i)) 

+ {a 2 + i- n){g(i-l) - g(i)), i>K, 
(a 2 + fi-i)(g{i + l)-g(i)) 

+ a 2 (g(i-l)-g{i)), i<K-l, 



Bfii) 

(3.3) 
where 



Ag(i) = ai (g(i + 1) - g(i)) + (3i(g(i - 1) - g(i)), 



a; 



a 2 , i > k, 

a 2 + fi — i, i < k — 1, 
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and 




a 2 + i — //, 



i > k, 
i < k — 1. 



.4 is the generator of bilateral birth and death processes ([28], Chapter 8) 
with "birth rates" specified by {ai : i € Z} and "death rates" by {f3i : i £ Z}. 
When fj, — o 2 <n<fi + o 2 + l so that all a^s and /3j's are positive, the 
bilateral birth and death processes are always nonexplosive and ergodic ( [28] , 
Chapter 8 and [12]). However, when // — a 2 is an integer and k = \i — a 2 , we 
get (5 K = (the only possible zero of all the transition rates), which means 
that all states in (— oo, k — 1] are transient. In other words, when the Markov 
chain is at a state in (— oo, k — 1], it will move quickly into states in [k, oo), 
while if the Markov chain is at a state in [n, oo), it will never visit states in 
(— oo,k — 1]. In this case, the approximating distribution is the same as a 
translated Poisson; it has been well treated in various papers (see [6, 9] and 
references therein). 

Cekanavicius and Vaitkus [9] studied the translated Poisson (referred to as 
centered Poisson in the paper) approximation to the sum W of independent 
indicator random variables with A = E(W) and A2 = A — Var(VF). Their 
approximating translated Poisson is the sum of L.A2J , the integer part of A2, 
and a Poisson random variable with mean A — LA2J • This distribution is a 
slight variation of our S, and a straightforward modification of the Stein- 
Chen method is used to estimate the approximation errors. Hence, from now 
on, we concentrate on the case where fi — a 2 < k < fi + a 2 + I. 

It is a routine exercise to check that ^ K (fi,a 2 ) is the equilibrium distribu- 
tion of the Markov chain with generator A, and that it satisfies the following 
balance equation: 

(3.4) aiiii = p i+ iTT i+1 Vz G Z. 

Denote by Zi(t),t > 0, the Markov chain generated by A with initial value 
i, and define stopping times 



(3.5) 



n =mi{t:Zi(t) ^i}, 
t+ =mf{t:Zi(t)=i + l} 
t~ = mi{t:Zi(t) = i-l} 



i £ Z. 



Lemma 3.1. For every bounded function h on Z, the integral 



POO 

g(i):=- {E[h(Zi(t))]-nHS)]}dt 



Jo 

is well defined and satisfies the Stein identity 



Ag(i) = h(i)-Eh(S). 
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Proof. Split the bilateral birth-death process Z% at n into two ordinary 
birth-death processes. Each of the two processes is a standard linear model, 
hence exponentially ergodic, implying that the process Z{ is also exponen- 
tially ergodic (see [12], Theorem 4.1, or [14], page 1679). More precisely, 
taking V(i) = 1 + \i — k\, cq = 1 and &o sufficiently large, we can see that the 
condition (D) (see the remark after the statement of the condition) in page 
1679 of [14] is satisfied, meaning that, by the second paragraph of [14], page 
1681, there is some < p < 1 such that, for all i£Z, there exists a finite 
constant Mj with 

\P(Zi(t) = j) —7Vj\ < M i p t for all t > 0. 

jez 

Hence, 

/*oo /*oo 

I \E[h{Zi{t))]-E[h(S)]\dt< sup \h(j)\ M lP t dt<oo, 
o jez Jo 

which ensures that g is well defined. 

Next, the general theory of Markov processes ensures that, for a > 0, 

/•OD 

{a-A)~ 1 {h-Eh{S)){i)= / e- at [Eh{Zi{t)) -Eh(S)]dt, 

Jo 

(see [16], page 10), and the Stein identity corresponds to the above equation 
when a = 0. A sketch of the proof of the Stein identity is as follows. Since 

Tj ~ exp(aj + f3i) and 



ai + Pi 1 ai + Pi 

by invoking the strong Markov property and momentarily ignoring integra- 
bility issues, we get 

/•OD 

g{i) = -E / [h(Zi(t)) - Eh(S)] dt 
Jo 

J [h{Zi{t))-Eh{S))dt + J [h(Z l (t)) -Eh(S)]dt 

h(t) - Eh(s) E r [h{Zi{t + Ti)) _ m{s)] dt 

Jo 

E [h(Z l+1 (t))-Eh(S)]dt 
ai + Pi ai + (Ji Jo 



(3.6) 



ai + Pi 
h(i)-Eh{S) a, 



Pi 



«i + P; 



poo 

■E / [h(Zi-i(t)) -Eh(S)]dt 
Jo 



Ki)-Eh(S) a, Pi 
ai + Pi ai + Pi ai + Pi 
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which, after reorganizing the terms, implies 

h(t) - Eh(S) = ai (g(i + 1) - g(i)) + [3 l {g(i - 1) - g(i)) = Ag(i), 

as desired. To prove (3.6) rigorously, by the strong Markov property, we 
have, for each < u < oo, 

E[h(Zi(t))-h(S)]dt 



-E / [h(Zi(t))-Eh(S)]dt 
Jo 

-E / [h(Zi(t)) -Eh(S)]dt-E / [h(Zi(t)) -Eh(S)]dt 

JO J TiAu 

Eh(S) — h(i)]E(Ti A u) 

- £ E^£[h(Zi(t)) - Eh(S)] dt\n = s Jp(t; G ds) 
Eh(S) — h(i)]E(Ti A u) 

pit ( ru—s > 

-J El J [h(Z i (s + v))-Eh{S)]dvT i = sW{T i eds) 
Eh(S) — h(i)]E(ri A u) 

E[h(Z i+1 (v)) - h(S)] dvlrin G ds) 



u ( ru—s 



OLi + Pi Jo 

- — Xfl- ri r s i[^.-iW) - ^(5)] du)p(Ti g 

Oi + PiJo {Jo ) 

\Eh(S) - h(i)]E(n A u) 



a: 



OLi + Pi Jo 

Pi 



OLi + Pi Jo 



E[h(Z i+l {v)) - h(S)]F(n <u-v)dv 



E[/i(Zi_i(u)) - /»(5)]P(Ti <u-v) dv. 



Letting u —> oo and applying the bounded convergence theorem yields (3.6). 

□ 



For fixed k±, &2 G Z with k\<k,2, define 



and 



e+(fc 1 ,/ C2 )=E jf ' l [klM] {Zi(t))dt, 
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the expected time that the Markov chain Zi(t) spends in [fei,^] before it 
reaches i — 1 and i + 1, respectively. We note that e~(— oo, oo) = Krf := t[~ 
and ef(— 00,00) = Er l - := T t - , as introduced in [8], page 1378. Hence, the 
following lemma generalizes Lemma 2.2 in [8]: 



Lemma 3.2. 



Ek 2 
i=iVfci ^ 



0. 



and 



ef(ki,k 2 ) 



0, 



z/z < fc 2j 

ifi> k 2 , 

ifi> k x , 
if i <k\. 



Proof. Since n ~ exp(oj + /%), and Tj < T i by (3.5), we have 



er(*i,A&)=Ejl l [jfcl)fe] (Z(i))di + E/ l [fclifca] (Z<(t))cft 



+ E f r l [fcl)fc2] (Zi(t)) <ft r, = r- )P(r, = r- 



+ E / l [felife2] (Z(t))di 



T i <r i mnKTi ). 



The second-to-last term is clearly zero. For the last term, given Ti<T i , we 
have Zi(ji) = i + 1, so by the strong Markov property, 



1 [ifei,A:2](^+l(*))^J 



E / l [klM] (Zi(t))dt 





rn+i,i-i 


T i <T i ^ 









where Tj lt j 2 =inf{t: Zj 1 (t) =32)- Now, again, by the strong Markov prop- 
erty, 

E i [fcijfca] (^i+iCt)) 



e r +i i [fclifc2] (z m (i))dt+E r +i> ^ i [klM (z i+1 (t)) 

JO Jt. m 



(7/ 



e y o i [fel)fc2] (z i+ i(t))^+Ey o i [fel)fe2] (z(t))dt 
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Combining the equations above gives 



1 a,; 
er(k 1 ,k 2 ) = l{ kl <i<k 2 } . , a + ( e i+i( k i,k 2 ) + e~ (k 1: k 2 ))- 



which, using (3.4), implies 

7TiAe,~(/ci,fc 2 ) =7r i l{ fcl < i <fc 2 } + 7r i aje^ 1 (fci,fc 2 ) 

= 7Ti 1 {fei<i<fe 2 } + A+l7Ti+l e i+l(fcl)fe2)- 

Clearly, e^(k±, k 2 ) = for i > k 2 , so 

&2 Z«2 

nPiei{kx, k 2 ) =^2[m/3ie^(ki,k 2 ) - 7r J+ i/3 J+1 e^ 1 (fc 1 , & 2 )] = ^ 7Tzl{fci<z<fc 2 }, 

Z=i Z=i 



which implies 



if h<i<k 2 , 



X> ^z 

'r fcl — > if « < ^1 , 



as desired. 
Likewise, 



e+(fc 1 ,fc 2 )=E ri [jfel)fe2] (Zi(i))dt + E l [jfel)fe2] (^(i))^ 

= HkxKiKk*} - \ R +(et-x(ki,k 2 ) + ef(k 1 ,k 2 )) —^- , 

which, together with (3.4), gives 

TT i a i ef(k 1 ,k 2 ) = 7r i l{ fel < i < fc2 } + nifiief^iki, k 2 ) 

= 7r i 1 {fci<i<fc 2 } + «i-l7ri-ie+.i(fci, A; 2 ). 
We have that ef(k\, k 2 ) = for i < k\, so 

« j 
7r i a i ef(k 1 ,k 2 )= Y^[natiel(k 1 ,k2)-a l -i'ir l -ief_ 1 (k 1 ,k2)] = ^ ^{fci^fca}' 

l=ki Z=fci 

again giving the claimed expression. □ 

Note that in the sequel, we will only need the quantities Er~ = e~(— oo, oo) 
and Er^~ = e^(— oo,oo), since we will focus on the choice k = min{i:i > 
fi} and the total variation metric. For other cases, the general result in 
Lemma 3.2 is needed. 
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Since 

¥{W ei)-P(5Gi) = ^ [F(W = j) - P(S = j)] 

we define 

hj(x) = l{j}(x) -TTj, 

g,j to be the solution of Agj = hj, and fj(i) = gj(i) — gj(i — 1). 



Lemma 3.3. For each j € Z, 

m - 



for i<j, 



Proof. Using the strong Markov property, we have 

Ml ru/\T + _ 1 ru 

/ Eh j (Z i _ 1 (t))dt = E h j (Z i - 1 (t))dt + '& I hj(Zi-i(t))dt 

JO JO Ju/\t+_ 1 

ruAr+_ 1 

= E / hAZi^it^dt 
Jo 

+ £{£ 'KhjiZiWdvlFir^eds), 

and letting u — > oo yields 

i f T t-i f°° 

Eh j (Z i - 1 (t))dt = E / hj(Zi-i(t))dt+ / Ehj(Zi(t))dt. 



Hence, for i < j, 

fj{ i )=9j{ i )-9j{ i - 1 ) 

= -/ Ehj(Zi(t))dt+ Ehj(Zi-i(t))dt 
Jo Jo 

r + -i 

= E ' hj(Zi-i(t))dt 
Jo 

Ei—l 
l=—oo "I 

= — 7TiJtLT- , = — Wn , 
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the last equality following from Lemma 3.2. Likewise, using 



Ehj(Zi(t))dt = E 
o Jo 



hj(Zi(t))dt+ / Ehj(Zi-i(t)) dt 
Jo 



and Lemma 3.2 again, it follows that, for i > j, 

roo roo 

fj(i) = - Ehj(Zi(t))dt+ E/ij(Zi_i(t))dt 
Jo Jo 

rT~ 77- 

= -E / * h^Ziit)) dt = njErf = vr,-^-* . D 
JO Pi"i 

Lemma 3.4. For AcZ, let 

h A (x) = l A (x)-F(SeA), 

gA be the solution to 

AgA = h A and f A (i) = g A {i) - gA(i - !)• 
If k = min{i : i > then for all i and A, 

A/a(») < — 5- A — A — < 

Remark. For approximating distributions on Z + satisfying the bal- 
ance equation (3.1) ([8], page 1382) proved that, if (3.2) is satisfied, then 
|A/a(»)| < ^ A Jr, and [22], Corollary 3.5.1, gives the bound |A/^(i)| < 
Afi(i) under the assumption of nonincreasing ctj's and nondecreasing /3j's, 
derived similarly to the inequality A/j(i) > Afj(i) below. Lemma 3.4 is par- 
allel to these types of estimates for the new approximating distribution sat- 
isfying the version of condition (3.2) which has been appropriately modified 
for its range. 

PROOF of Lemma 3.4. It follows from Lemma 3.3 that 
A/i(i)=/ j (* + l)-/ J -(») 



-7T 



i-1 



*3 



ttj-lTTi-l 

l=i+l ^ Ei = i ^ 



A+i 71 "' 



i> j. 



Since ft = min{« : i > fj,} and, therefore, /x < k < /x + 1, one can verify directly 
that {a,, i E Z} are nonincreasing and {A, i £ Z} are nondecreasing. Hence, 
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for i <j, 



EL-oo 7Ti E?= 1 ioo n i 


_ Z—il=—oo 




OLilti Q!j_l7ri_l 


OLi-Ki 


A 7 ^ 




= 1 ( 


< i i-1 \ 
A X! - Oli ^ TtA 

\ Z=— oo l=— oo / 







where for the first and last equalities we have applied the balance equation 
am = A+ittj+1- Likewise, for i>j, 

i=i+i n _ Xd=i n = E;=»+i n _ l^i=i n 

A+lTfi+l A^i "i^i A 71 "! 

-j / oo oo \ 

A 51 ^ ~ ai E n n 

l=i+l l=i ) 



aiPiTT. 



2 / oo oo \ 

Hence, Afj(i) < for j 7^ i and Afj(i) > for j = i, and for any A C Z, 

V OiiTTi PiTTi 

Y^OO V^*~ 1 -7T 

" ~^T~ + A 

1 / OO t — 1 \ 1 



^—nr\ E n+ E 



a» A A V;"^! ^ / o*AA 

To obtain the other terms in the bound, note that, since {ai, i € Z} are 
nonincreasing and {A> i € Z} are nondecreasing, for I > i + 1, we have 

^ = A "A - ' 

and for I < i — 1, 

A+i 717+1 . A^+i 

ttz = < , 

a; a« 

which in turn imply 
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and 

A «i CKi 

Now, it follows from (3.7) and (3.8) that Af A (i) < while combining 

(3.7) and (3.9) gives Af A (i) < l/ ai . 

On the other hand, since hz = 1 — IP(*S' £ Z) = 0, we have 

-A W! )=A M , (! )<i^AiAl 
Noting that aij and /3j are both at least a 2 for all i completes the proof. □ 

4. Zero biasing and approximation theorems. We define the total vari- 
ation distance between two probability measures Qi,Q.2 on Z as follows: 

rfrv(Qi,Q2) = sup \Qt(A) - Q 2 (A)\. 

AcZ 

Using Lemma 3.4, we prove the following general theorems, which parallel 
results in the continuous case showing that Y is close to normal when Y and 
Y* are close. Throughout this section, we write ^(/i,a 2 ) for ^ K (fi,a 2 ) for k 
chosen as in Section 3, that is, 

K = mm{i : i > fj,}. 

Theorem 4.1. Let Y be an integer-valued random variable with mean \x 
and finite variance a 2 , and let Y* have the Y -zero biased distribution. Then 

oo 

d TY (C(Y), *(p,a 2 )) < \HY = i)~ V(Y* = 01 

i=K 

K-l 

+ \W(Y = i)-F(Y* + l = i)\. 

i=— oc 

Proof. With h A and f A as in Lemma 3.4, recalling the form of the 
operator B in (2.11), we have, by the zero bias property (2.5), 

\F(Y eA)-F{S eA)\ 

= \Eh A (Y)\ = \EAg A (Y)\ = \EBf A (Y)\ 

= \a 2 EAf A (Y) - E{(Y - fi)f A (Y)l Y > K + (Y - »)f A (Y + l)ly< K _i}| 
= a 2 \E{Af A (Y)} - E{[A(/ A (y*)l y *> K )] + [A{f A {Y* + l)ly*< K _i)]}|. 
However, note that, for any p, 

A(/(i)l<> P ) = f(i + l)li+i> P - f(i)h> P 
= [Af(i)]l t > p + f(p)l i=p _ 1 
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and 

A(/(i + l)li< p _i) = f(i + 2)l i+1 <p_! - /(i + l)l<<p_i 

= [A/(« + l)]l i < p _ 2 -/(p)l i=p _i. 

Hence, with the help of the cancellation of the term f(p)li= p —i, the above 
expectation equals 

a 2 |E{A/ A (y)} - E{[A/ A (Y*)]l y *> K + [A/ A (Y* + l)]ly.< K _ 2 }| 
= <r 2 



£ A/ A (i)p(y = i) 

t=— oo 

oo re— 1 

^A/ A (i)P(F* = z)- 2 A/ A (i)P(Y* + l 

i=re i=— oo 

<a 2 ^|A/ A (i)(p(y = i)-p(^ = i))| 

i=re 

K-l 

+ a 2 ^ |A/ A (*)(p(y=i)-p(y* + i=i))| 



OO 



l = — OO 

oo re— 1 



<53|P(Y" = t) -P(y* = *)| + J2 \P(X = i)-¥(Y* + l = i)\, 

i=K i=—oo 

where we have applied the bound |A/ A (i)| < l/c 2 shown in Lemma 3.4. □ 

Before applying Theorem 4.1 to the case where W is a sum, we note that 
the existence of a finite first moment of Y* is equivalent to the existence of 
a finite third moment of Y; letting f(y) = y 2 in (2.5), 

E[Y 3 - fiY 2 ] = <r 2 E[2Y* + 1]. 

Theorem 4.2. Let i = l,...,n, be independent integer-valued ran- 
dom variables and let W = Ya=i Then, with Wi = W — Var(£j) = of 
and defined on the same space as £j, with the £j zero-biased distribution 
for i = 1, . . . , n, with n = E(W) and a 2 = Var(W), we have for any K > 0, 

d TV (C(W),^(fi,a 2 )) 

2 n 



< ^ E - I A if) + E(|fc - (tf + 1) I A K)\ 



2 n 

+^E^ E Pte=^,er=fe) 



a i=i l|fei-fc 2 |>J<r 
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\ki-k 2 \>K 



\ki-k 2 \>K J 



where d + = d^y(C(Wi), £(Wi + 1)), i = 1, . . . ,n. In particular, letting K j 
oo, 

r) n 

(4.1) d Ty (£(W),V(Li,a 2 )) < -J2^df[m -£*| - (tf + 1)|], 



which will be finite when E|£j| 3 < oo,z = 1, . . . ,n. 

Proof. Considering the first sum in the bound of Theorem 4.1, by 
invoking (2.4) we have 

oo 

j2\nw=j)-nw*=j)\ 

j = K 

n 2 00 

^ E % E \nwi + & = j) - w + # = j) i 

i=l J=K 

n 2 00 

^E^E E |p(Wi=j+fci)-p(wi=i+fc 2 )|p(ei=-fci > ^=-fc 2 ) 

i=l j=nk\,k2 
n 2 00 fciVfc2— 1 

<E^E E E \nw,=j+i) 

i=l j'=k; |fci— ft 2 |<i<r i=fciA/D2 



p(Wi = j + 1 + 1) |p(e< = = -fc 2 ) 



n 2 00 



+E^E E [p(w*=j+*i) 

i=l j= K \k 1 -k 2 \>K 



+ P(Wi = j + A: 2 )]P(e i = -fci, £ = -A& 



n a? 



1=1 k |fci-fc 8 |<Jf 



+ Yl m = -ki,z; = -k2)) 

\k!-k2\>K ) 



a i=l I |fc 1 -fc 2 |>-ft: 

the bound on the remaining sum can be shown similarly. □ 
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Remark. When W is the sum of many terms of comparable order, the 
bound in (4.1) is small when ai_ , i = 1, . . . ,n, are small, which is ensured 
by the condition that, with large probability, W is not concentrated on a 
lattice of span greater than 1; see Remark 4.5. 

Remark. Note that no signed measures, truncation or translation are 
required, in contrast to Barbour and Xia [6], Cekanavicius and Vaitkus [9] 
and Barbour and Choi [3]. 

Corollary 4.3. Let 1$, i = 1, . . . ,n, be independent indicator random 
variables with 

P(J i = l) = l-P(I i = 0)=ji i , i = l,...,n, 

n n n 

W = ^2li, f Jl = ^2p il a 2 = ^2pi(l-pi) and 

i=l i=l i=l 

•d 2 = a 2 — maxpj(l — pi). 



Then 



d TV (C(W),^(^a 2 ))<- 



As for approximations using the central limit theorem, we do not expect 
the pi's to be small; the bound here has the same order as those in the 
classical central limit theorem, polynomial birth-death approximation [8] 
and compound Poisson signed measures approximation [6]. Moreover, there 
are no additional assumptions required as in [8] or signed measures as in [6]. 

Proof of Corollary 4.3. Since W% is unimodal in this case ([25], 
page 1273), we have 

d TY (C(Wi),C(W t + 1)) < maxP(Wi = j) < \{a 2 - Pl (l - Pl )y 1/2 < 

where the second inequality is due to Barbour and Jensen [5], page 78. Since 
/* = by (2.7), we have 

M\Ii-It\=Pi, E|Ii- (i? + l)| = l -j*, 
and it follows from (4.1) that 

1 n 

(hv(C(W)M^ ^)) <pE °1 Wi ~ n\ + Wi ~ (It + 1)0 

i=l 

1 
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Remark. Note that the proofs do not depend on the order of the index 
set {l,...,ra} of the £j's, so one may apply the approximation theorems 
to the sum of independent integer-valued random variables on an arbitrary 
index set. 

To estimate df+ in general, one may apply Proposition 4.6 of [6], quoted 
below. 

Proposition 4.4. Suppose that 1 < i < n, are independent integer- 
valued random variables, and set Ui = 1 — dTv(£{£,i), + 1)), U = 
Ya=i minjuj, 1/2}. Then, ifW = Y^i=i£,i> we have 

d T v(c(w),c(w + 1)) < u- 1 ' 2 . 

Hence, with Wi = W — 

max d T v(C(W i ),C(W l + 1)) <(U- 1)~ 1/2 . 

l<i<n 

Remark. As discussed in [24], Section 11.12-14, d T v(C{W), C(W + 1)) 
is of order n -1 / 2 when £j, i = 1, . . . ,n, are independent and identically dis- 
tributed with an aperiodic distribution. 

Remark 4.5. The assumption of aperiodicity is essential here, where the 
total variation metric is used. To see why, take £j, i = 1, . . . ,n, independent 
with distribution P(£j = 0) = P(£j = 3) = 1/2. Then, with probability one, 
W is concentrated on {0, 3,6,...}, a lattice of span greater than 1, and 

d TY (£(W),y(v,o- 2 )) 

= ij2\nw=j)-ns=j)\>± j2 ns=j)=oo). 

j M0,3,...} 

If one wants to lift the assumption of aperiodicity, it is essential to weaken 
the metric to the Kolmogorov metric, in which case, unless higher moments 
of £j's (e.g., the third moments) do not exist, the Berry-Esseen theorem 
would be sufficient. 

Remark. When k = mm{i:i > fx}, the variance of S does not match 
that of the sum W of n independent and identically distributed integer- 
valued random variables; however, crude estimates show that Var(5) / Var (W) 
approaches 1 as n — > oo. It is hoped that future research could address this 
issue and sharpen the estimates of the approximation errors. 
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